Orange County
- North America > United States > North Carolina > Durham County > Durham (0.04)
- North America > United States > North Carolina > Orange County > Chapel Hill (0.04)
- Asia > Middle East > Jordan (0.04)
- North America > United States > North Carolina > Durham County > Durham (0.04)
- North America > United States > North Carolina > Orange County > Chapel Hill (0.04)
- Asia > Middle East > Jordan (0.04)
- Research Report > Experimental Study (0.68)
- Research Report > Strength High (0.46)
- Research Report > New Finding (0.46)
- North America > United States > California > Santa Clara County > Palo Alto (0.04)
- South America > Peru > Cusco Department > Cusco Province > Cusco (0.04)
- Asia > Japan (0.04)
- (3 more...)
- Health & Medicine (1.00)
- Law (0.67)
- Education > Educational Setting (0.46)
- Information Technology > Artificial Intelligence > Vision (1.00)
- Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
- Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
- Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.35)
Achieving Constant Regret in Linear Markov Decision Processes
We study the constant regret guarantees in reinforcement learning (RL). Our objective is to design an algorithm that incurs only finite regret over infinite episodes with high probability. We introduce an algorithm, Cert-LSVI-UCB, for misspec-ified linear Markov decision processes (MDPs) where both the transition kernel and the reward function can be approximated by some linear function up to mis-specification level ζ . At the core of Cert-LSVI-UCB is an innovative certified estimator, which facilitates a fine-grained concentration analysis for multi-phase value-targeted regression, enabling us to establish an instance-dependent regret bound that is constant w.r.t. the number of episodes.
- North America > United States > California > Los Angeles County > Los Angeles (0.28)
- North America > United States > Massachusetts > Middlesex County > Cambridge (0.14)
- North America > United States > North Carolina > Orange County > Chapel Hill (0.04)
- Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)
- North America > United States > Illinois > Champaign County > Champaign (0.04)
- North America > United States > North Carolina > Orange County > Chapel Hill (0.04)
- Health & Medicine (0.45)
- Food & Agriculture (0.45)
- Government (0.45)
- North America > United States > North Carolina > Orange County > Chapel Hill (0.04)
- South America > Chile > Santiago Metropolitan Region > Santiago Province > Santiago (0.04)
- Europe > United Kingdom (0.04)
- Asia > China > Chongqing Province > Chongqing (0.04)
- Research Report > Experimental Study (1.00)
- Research Report > New Finding (0.93)
- Health & Medicine > Health Care Technology (1.00)
- Health & Medicine > Therapeutic Area > Neurology > Alzheimer's Disease (0.46)
Provable Offline Reinforcement Learning for Structured Cyclic MDPs
Lee, Kyungbok, Sarteau, Angelica Cristello, Kosorok, Michael R.
We introduce a novel cyclic Markov decision process (MDP) framework for multi-step decision problems with heterogeneous stage-specific dynamics, transitions, and discount factors across the cycle. In this setting, offline learning is challenging: optimizing a policy at any stage shifts the state distributions of subsequent stages, propagating mismatch across the cycle. To address this, we propose a modular structural framework that decomposes the cyclic process into stage-wise sub-problems. While generally applicable, we instantiate this principle as CycleFQI, an extension of fitted Q-iteration enabling theoretical analysis and interpretation. It uses a vector of stage-specific Q-functions, tailored to each stage, to capture within-stage sequences and transitions between stages. This modular design enables partial control, allowing some stages to be optimized while others follow predefined policies. We establish finite-sample suboptimality error bounds and derive global convergence rates under Besov regularity, demonstrating that CycleFQI mitigates the curse of dimensionality compared to monolithic baselines. Additionally, we propose a sieve-based method for asymptotic inference of optimal policy values under a margin condition. Experiments on simulated and real-world Type 1 Diabetes data sets demonstrate CycleFQI's effectiveness.
- North America > United States > North Carolina > Orange County > Chapel Hill (0.04)
- North America > United States > Tennessee > Davidson County > Nashville (0.04)
- Europe > Portugal > Porto > Porto (0.04)
- Health & Medicine > Therapeutic Area > Endocrinology > Diabetes (1.00)
- Education > Health & Safety > School Nutrition (0.92)
- Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
- Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)
- Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.92)
- Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Undirected Networks > Markov Models (0.34)
- North America > United States > New York (0.04)
- Asia > China > Hubei Province > Wuhan (0.04)
- North America > United States > North Carolina > Orange County > Chapel Hill (0.04)
- Europe (0.04)
- Research Report > New Finding (1.00)
- Research Report > Experimental Study (1.00)
- North America > United States > North Carolina > Orange County > Chapel Hill (0.04)
- Europe > United Kingdom > England > Oxfordshire > Oxford (0.04)